257 research outputs found

    Robust Estimation with Discrete Explanatory Variables

    Get PDF

    Bounded Influence Regression in the Presence of Heteroskedasticity of Unknown Form

    Get PDF
    In a regression model with conditional heteroskedasticity of unknown form, we propose a general class of M-estimators scaled by nonparametric estimates of the conditional standard deviations of the dependent variable. We give regularity conditions under which these estimators are asymptotically equivalent to M-estimators scaled by the true conditional standard deviations. The practical performance of these estimators is investigated through a Monte Carlo experiment

    Yet another breakdown point notion: EFSBP - illustrated at scale-shape models

    Full text link
    The breakdown point in its different variants is one of the central notions to quantify the global robustness of a procedure. We propose a simple supplementary variant which is useful in situations where we have no obvious or only partial equivariance: Extending the Donoho and Huber(1983) Finite Sample Breakdown Point, we propose the Expected Finite Sample Breakdown Point to produce less configuration-dependent values while still preserving the finite sample aspect of the former definition. We apply this notion for joint estimation of scale and shape (with only scale-equivariance available), exemplified for generalized Pareto, generalized extreme value, Weibull, and Gamma distributions. In these settings, we are interested in highly-robust, easy-to-compute initial estimators; to this end we study Pickands-type and Location-Dispersion-type estimators and compute their respective breakdown points.Comment: 21 pages, 4 figure

    Private Drinking Water Wells as a Source of Exposure to Perfluorooctanoic Acid (PFOA) in Communities Surrounding a Fluoropolymer Production Facility

    Get PDF
    BACKGROUND: The C8 Health Project was established in 2005 to collect data on perfluorooctanoic acid (PFOA, or C8) and human health in Ohio and West Virginia communities contaminated by a fluoropolymer production facility. OBJECTIVE: We assessed PFOA exposure via contaminated drinking water in a subset of C8 Health Project participants who drank water from private wells. METHODS: Participants provided demographic information and residential, occupational, and medical histories. Laboratory analyses were conducted to determine serum-PFOA concentrations. PFOA data were collected from 2001 through 2005 from 62 private drinking water wells. We examined the relationship between drinking water and PFOA levels in serum using robust regression methods. As a comparison with regression models, we used a first-order, single-compartment pharmacokinetic model to estimate the serum:drinking-water concentration ratio at steady state. RESULTS: The median serum PFOA concentration in 108 study participants who used private wells was 75.7 μg/L, approximately 20 times greater than the levels in the U.S. general population but similar to those of local residents who drank public water. Each 1 μg/L increase in PFOA levels in drinking water was associated with an increase in serum concentrations of 141.5 μg/L (95% confidence interval, 134.9-148.1). The serum:drinking-water concentration ratio for the steady-state pharmacokinetic model was 114. CONCLUSIONS: PFOA-contaminated drinking water is a significant contributor to PFOA levels in serum in the study population. Regression methods and pharmacokinetic modeling produced similar estimates of the relationship

    Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines

    Get PDF
    Background: Multiple imputation (MI) provides an effective approach to handle missing covariate data within prognostic modelling studies, as it can properly account for the missing data uncertainty. The multiply imputed datasets are each analysed using standard prognostic modelling techniques to obtain the estimates of interest. The estimates from each imputed dataset are then combined into one overall estimate and variance, incorporating both the within and between imputation variability. Rubin's rules for combining these multiply imputed estimates are based on asymptotic theory. The resulting combined estimates may be more accurate if the posterior distribution of the population parameter of interest is better approximated by the normal distribution. However, the normality assumption may not be appropriate for all the parameters of interest when analysing prognostic modelling studies, such as predicted survival probabilities and model performance measures. Methods: Guidelines for combining the estimates of interest when analysing prognostic modelling studies are provided. A literature review is performed to identify current practice for combining such estimates in prognostic modelling studies. Results: Methods for combining all reported estimates after MI were not well reported in the current literature. Rubin's rules without applying any transformations were the standard approach used, when any method was stated. Conclusion: The proposed simple guidelines for combining estimates after MI may lead to a wider and more appropriate use of MI in future prognostic modelling studies

    On the Schoenberg Transformations in Data Analysis: Theory and Illustrations

    Get PDF
    The class of Schoenberg transformations, embedding Euclidean distances into higher dimensional Euclidean spaces, is presented, and derived from theorems on positive definite and conditionally negative definite matrices. Original results on the arc lengths, angles and curvature of the transformations are proposed, and visualized on artificial data sets by classical multidimensional scaling. A simple distance-based discriminant algorithm illustrates the theory, intimately connected to the Gaussian kernels of Machine Learning

    Assessing Levels of Attention Using Low Cost Eye Tracking

    Get PDF
    The emergence of mobile eye trackers embedded in next generation smartphones or VR displays will make it possible to trace not only what objects we look at but also the level of attention in a given situation. Exploring whether we can quantify the engagement of a user interacting with a laptop, we apply mobile eye tracking in an in-depth study over 2 weeks with nearly 10.000 observations to assess pupil size changes, related to attentional aspects of alertness, orientation and conflict resolution. Visually presenting conflicting cues and targets we hypothesize that it's feasible to measure the allocated effort when responding to confusing stimuli. Although such experiments are normally carried out in a lab, we are able to differentiate between sustained alertness and complex decision making even with low cost eye tracking "in the wild". From a quantified self perspective of individual behavioral adaptation, the correlations between the pupil size and the task dependent reaction time and error rates may longer term provide a foundation for modifying smartphone content and interaction to the users perceived level of attention.Comment: 12 pages, 6 figures, 2 tables. The final publication will be available at Springer via http://dx.doi.org/DOIxxx, when published as part of the HCI International 2016 Conference Proceeding

    Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method

    Get PDF
    BACKGROUND: Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness), hence leaving a serious bottleneck. In this paper we propose a method for automatically acquiring knowledge to shorten the pattern list by locating the novel and interesting ones. METHODS: The dual-mining method is based on automatically comparing the strength of patterns mined from a database with the strength of equivalent patterns mined from a relevant knowledgebase. When these two estimates of pattern strength do not match, a high "surprise score" is assigned to the pattern, identifying the pattern as potentially interesting. The surprise score captures the degree of novelty or interestingness of the mined pattern. In addition, we show how to compute p values for each surprise score, thus filtering out noise and attaching statistical significance. RESULTS: We have implemented the dual-mining method using scripts written in Perl and R. We applied the method to a large patient database and a biomedical literature citation knowledgebase. The system estimated association scores for 50,000 patterns, composed of disease entities and lab results, by querying the database and the knowledgebase. It then computed the surprise scores by comparing the pairs of association scores. Finally, the system estimated statistical significance of the scores. CONCLUSION: The dual-mining method eliminates more than 90% of patterns with strong associations, thus identifying them as uninteresting. We found that the pruning of patterns using the surprise score matched the biomedical evidence in the 100 cases that were examined by hand. The method automates the acquisition of knowledge, thus reducing dependence on the knowledge elicited from human expert, which is usually a rate-limiting step

    Fish Consumption and Mercury Exposure among Louisiana Recreational Anglers

    Get PDF
    Ba c k g r o u n d: Methylmercury (MeHg) exposure assessments among average fish consumers in the United States may underestimate exposures among U.S. subpopulations with high intakes of region-ally specific fish. obj e c t i v e s: We examined relationships among fish consumption, estimated mercury (Hg) intake, and measured Hg exposure within one such potentially highlyexposed group, recreational anglers in the state of Louisiana, USA. Me t h o d s: We surveyed 534 anglers in 2006 using interviews at boat launches and fishing tourna-ments combined with an Internet-based survey method. Hair samples from 402 of these anglers were collected and analyzed for total Hg. Questionnaires provided information on species-specific fish consumption during the 3 months before the survey. re s u l t s: Anglers’ median hairHg concentration was 0.81 μg/g (n = 398; range, 0.02–10.7 μg/g);40% of participants had levels >1 μg/g, which approximately corresponds to the U.S. Environmental Protection Agency’s reference dose. Fish consumption and Hg intake were significantly positively associated with hairHg. Participants reported consuming nearly 80 different fish types, many of which are specific to the region. Unlike the general U.S. population, which acquires most of its Hg from commercial seafood sources, approximately 64% of participants’ fish meals and 74% of their estimated Hg intake came from recreationally caught seafood. co n c l u s i o n s: Study participants had relatively elevated hairHg concentrations and reported con-sumption of a wide variety of fish, particularly locally caught fish. This group represents a highlyexposed subpopulation with an exposure profile that differs from fish consumers in other regions of the United States, suggesting a need for more regionallyspecific exposure estimates and public health advisories.ISSN:1552-9924ISSN:0091-676
    corecore